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Abstract. We assess the effeet of automated formative evaluations on reading 
eomprehension skills in a eourse of English for Speeifie Purposes (ESP) in the area of 
kinesiology at the Uni versidad Austral de Chile - Valdivia (UACh). The evaluations 
were implemented using Questionmark's Pereeption (QMP) (Questionmark- 
Corporation, 2015). We investigate: (1) Do formative reading eomprehension 
assessments enhanee students' reading eomprehension skills? (2) How do students 
pereeive QMP? The experimental design used for this study was pre-test/post-test 
with eontrol group. The partieipants were 57 freshmen, kinesiology students from 
UACh, randomly divided into two groups: G1 -experimental, G2-eontrol. After the 
pre-test, G1 worked on 11 online reading eomprehension modules, whieh ineluded 
formative evaluations with automated immediate feedbaek, while G2 did the same 
work with printed materials. At the end, both groups took the same post-test. The 
results show that there were no statistieally signifieant differenees between the 
mean grade differenees (post- test grade - pre-test grade) of G1 and G2. G1 ’s surveys 
showed positive attitudes towards the use of automated formative evaluations. Our 
eonelusions are that for our population, the use of eomputer teehnology was at least 
as effeetive as instruetion without teehnology. Furthermore, QMP was satisfaetorily 
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evaluated by the students, and it allowed the professor to monitor and timely deteet 
students with performanee problems thanks to the different reports it provides. 

Keywords: formative evaluations, English as a foreign language, online reading 
eomprehension, English for speeifie purposes. 

1. Introduction 

Formative assessment has long been believed to be effective, as documented in 
Black and Wiliam (1998). However, recent studies, like Kingston and Nash (2011), 
challenge that belief. This controversy, along with the introduction of multiple 
technological tools that implement evaluations, motivated us to test one such tool: 
QMP (Questionmark-Corporation, 2015), applied to reading comprehension in 
ESP. From a curricular viewpoint, our institution currently applies a competencies- 
based model (Jabif, 2007; UACh, 2007), which adopts a holistic, integrating vision, 
with methodologies that are based more on student learning than on the professor’s 
teaching. Therefore, the use of computer based formative assessments is consistent 
with UACh’s policies. 

The process we followed to select QMP and descriptive statistics of the study 
are presented in Lazzeri et al. (2015). Here we present a more comprehensive 
analysis of our population composition and a performance comparison considering 
the entrance skill level of the students. We also study the students’ preferences 
according to their survey answers, and consider other advantages that a Computer 
Based Assessment (CBA) software, such as QMP, can offer. 

Our research questions are: (1) Do formative reading comprehension assessments 
enhance students' reading comprehension skills? (2) How do students perceive 
QMP? 

2. Method 

2.1. Experimental design 

Study Type: Pre-test/post-test with control group. 

Population: 57 freshmen kinesiology students from UACh; Age: average=19, 
SD=2.5; Gender: 51% male, 49% female; High-school type: 38% public, 56% 
subsidized private, 6% private. Only 19% had a CEFR certification at the ALTE 
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A2 or B 1 levels, which are the goal levels specified by the Chilean government for 
elementary and high-school graduates, respectively. 

Our population was randomly divided into two groups: G1 -experimental, G2- 
control. After the pre-test, G1 worked on seven lessons that were implemented as 
1 1 online reading comprehension modules, which included formative evaluations 
with automated immediate feedback, while G2 did the same work with printed 
materials. The only difference was the presentation mode of the material and the 
automated feedback. Data was collected from a pre-test, post-test, and survey 
application. 

2.2. Instruments 

The pre-test and post-test were developed and graded by the course’s instructor. 
The students’ satisfaction survey was developed by data analysis specialists. The 
lessons, used as learning measurement instruments, were designed by the course 
instructor and implemented by the software administrator in QMP automated 
formative assessment modules with immediate feedback. Each lesson contained 
a paper/reading in the field of kinesiology and several exercises related to that 
reading. Table 1 shows the composition of each lesson in terms of length of paper 
in words and types of exercises used. 


Table 1 . Lesson composition 


Lesson 

Paper 

Word 

Count 

Number of exercises for each type of question 

Total 



Essay 

Multiple 

Choice 

Column 

Match 

Cloze 

Brief 

Text 

Survey 

Matrix 

T/F 

Essay 

with 

Extra 

Text 


1 

1167 


2 

17 

10 



10 


39 

2 

3047 


20 

10 

10 

45 




85 

3 

5363 

17 


20 

10 

15 


10 


72 

4 

4111 



10 

10 

10 


10 

10 

50 

5 

4480 



10 


30 


10 

10 

60 

6 

5116 

10 

10 

10 

10 

30 

10 

10 

20 

110 

7 

5504 

3 


10 

20 

50 


20 


103 

Total 

28788 

30 

32 

87 

70 

180 

10 

70 

40 

519 


To determine improvement in reading comprehension skills, we used the dependent 
variable “Academic Performance on Reading Comprehension” as measured by the 
grade obtained in the exams, given in a 1-7 scale, where 7 is best, which is the 
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standard in the Chilean educational system. More precisely, we used the difference 
in academic performance between the post-test and the pre-test results. G1 and 
G2 were compared in terms of this variable using Student’s t-test for independent 
samples, since the preconditions to use this kind of test were satisfied. The 
independent variable was “use of QMP” (Yes/No). The statistical analysis was 
carried out with SPSS 11.5. 

The students’ satisfaction survey contained 8 Likert- style questions directly 
related to the use of QMP that allowed us to get the students’ perceptions about the 
platform. The internal consistency of this survey was determined by computing the 
Alpha Cronbach reliability coefficient as .747, which is deemed acceptable. 

3. Discussion 

Table 2 summarizes the results of both groups in the pre-test and post-test. 

Table 2. Pre-test and post-test results 


Group 


Post-test 

Pre-test 


n 

Mean 

Sdev 

Mean Sdev 

G1 

28 

5.60 

0.97 

2.96 0.92 

G2 

29 

5.51 

0.76 

3.24 0.69 


It is important to notice that G1 had on average lower scores in the pre-test than 
G2, but after completing the lessons using the QMP modules, they got a higher 
average than G2 (using printed materials) in the post-test. Nevertheless, there were 
no statistically significant differences between the mean grade differences with 
95% confidence (/=1.41,/?=0.16>0.05) as shown in Table 3. 


Table 3. Statistical comparison 







Levene’sTest 






Standard 

for Variance 

t-test for media 


Group 

N 

Media 

Deviation 

Equality 

equality 






F 

Sig. 

t Sig. 

Difference 
(Post-test - 
Pre-test) 

G1 

28 

2.64 

0.95 





G2 

29 

2.27 

1.00 

0.90 

0.79 

1.41 0.16 


The answers to the questions related to the students’ perceptions about QMP for G1 
are summarized in Table 4. 
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Table 4. Survey results 


Students’ 
Perception about 
the use of QMP 

Strongly 

Disagree 

Disagree 

Neutral 

Ag 

;ree 

Str 

Aj 

ongly 

gree 

n 

% 

n 

% 

n 

% 

n 

% 

n 

% 

I liked learning using the 
methodology based on QMP 

1 

3.6 

1 

3.6 

8 

28.6 

13 

46 

5 

17.9 

Seeing my classmates’ 
progress motivates me to 
work 

0 

0 

3 

10.7 

9 

32.1 

14 

50 

2 

7.1 

At the end of each session I 
feel that I have learned 

0 

0 

3 

10.7 

14 

50 

7 

25 

4 

14.3 

The automatic feedback from 
the platform helps my 
learning 

1 

3.6 

0 

0 

8 

28.6 

7 

25 

12 

42.9 

Using QMP made me feel 
more confident about my 
knowledge 


3.6 

6 

21.4 

10 

35.7 

7 

25 

4 

14.3 

I like to have the control over 
my learning process 

0 

0 

0 

0 

4 

14.3 

10 

36 

14 

50 

I used the trial and error 
method as a source of 
learning. 


3.6 

0 

0 

4 

14.3 

19 

68 

4 

14.3 

The platform QMP met my 
expectations. 

0 

0 

5 

17.9 

10 

35.7 

11 

39 

2 

7.1 


We can highlight that 64% liked the QMP -based methodology, 86% enjoyed 
controlling their learning process, and 82% used trial and error as a learning 
strategy. These percentages are obtained by adding the “Agree”, and “Strongly 
Agree” answers for each question. Furthermore, in a separate question, 89.3% 
recommended their peers to volunteer for the QMP evaluation process. Another 
positive aspect of QMP is its functionalities to generate useful reports, such as 
the Test Analysis Report, partially shown in Figure 1, which shows diverse test 
statistics and a reliability analysis. 

4. Conclusions 

Despite not finding statistically significant performance improvements, we can 
conclude that, for our population, the use of computer technology was at least 
as effective as instruction without technology, which coincides with some of the 
findings in Grgurovic, Chapelle, and Shelley (2013). Furthermore, QMP was 
satisfactorily evaluated by the students. QMP also allowed the professor to monitor 
and timely detect students with performance problems thanks to the different reports 
it provides, which offered relevant information such as students’ performance for 
each exercise and formative evaluation, and items that proved to be easiest or most 
difficult, among others. 
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Figure 1. QMP’s test analysis report^ 


Test An alysis Report 


Handouts 

naaifi 

Afi5a§5niaDt 
auiliQi 


400219 



H05 


3948568167720624 


Apr 22 2014 00:00:00 

last modified 
M dates 

P^pjp^t? whQ finished 

Table of Test Statistics 


^ Sse22 2014 01:53:22 


Number of examinees 

30 

Mean 

(16^7%) 

1.19/30 

(3.97%) 

Number ofitems 

21 

Median 

3, '30 Standard error of 

2.79,'30 

(93%) 

Majijmum PQSsib.lg 
score 

30 

Made 

0,'30 

(0%) 

2257 

Minimum ^chieve^ 
score 

0,30 

(0%) 

Standard dg^iatioji 

6.5, '30 
(21.67%) 

5217 

score 

2 7, '30 
(90%) 

Vadanae. 

42.21,'30 Test jgJjaMity 
(140.7%) 

0.816 


Reliability is most meaningful if all items cover the same subject area. 


Reliability ( To^ Leyelj 


XapMi 


Mean 

Standard 

iLebafejity: 


21 

5,'30 6.5/30 

(16.67%) (21.67%) 

0.816 

Kinesiology- H andp jjt 5 

1 

2/10 

(20%) 

3.01/10 

(30.1%) 

- 

HansisHJS TFH05 

10 

327/10 

(32.7%) 

1.98/10 

(19.8%) 

0.813 


Since this is the first time that technology in the area of computer based formative 
evaluation has been introduced in the Faculty of Medicine at UACh in a course 
of ESP, we present a preliminary evaluation in this area showing that the use of 
technology contributes to learning in a different way, more compatible with today’s 
demands from the digital world. However, we cannot generalize the results at this 
point. 
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